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Abstract 

In this paper, we empirically investigate correlations among four centrality measures, originated 
from the social science, of various complex networks. For each network, we compute the cen- 
trality measures, from which the partial correlation as well as the correlation coefficient among 
measures is estimated. We uncover that the degree and the betweenness centrality are highly cor- 
related; furthermore, the betweenness follows a power-law distribution irrespective of the type of 
networks. This characteristic is further examined in terms of the conditional probability distribu- 
tion of the betweenness, given the degree. The conditional distribution also exhibits a power-law 
behavior independent of the degree which explains partially, if not whole, the origin of the power- 
law distribution of the betweenness. A similar analysis on the random network reveals that these 
characteristics are not found in the random network. 
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I. INTRODUCTION 



The network (or graph) is a useful way of expressing and investigating quantitatively 
the characteristics of complex systems in various disciplines. It consists of a set of vertices 
representing entities, and edges associated with connections between vertices Numerous 
complex systems can be and have been expressed in terms of networks, and they are often 
classified by the research field, such as social , technological |^, |5|, and biological 
networks y, |2j, to name just a few. 

Early researches on the network focused mainly on the regular and the random net- 
works from which many mathematical results for general structural characteristics have 
been extracted Recently, due to the availability of computers and the Internet, study 
on large-scale statistical properties of complex networks has been possible. It was found 
that many complex networks had distinctive features in common, such as the power-law 
distribution of the degree and the clique of the network, resulting in the scale-free [9| and 
the small world networks jl^. These uncovered characteristics, which differ from those of 
the regular and random networks, was the trigger that brought about considerable advances 
in the understanding of complex networks, including the development of numerous analysis 
tools and devising more accurate topological models for the observed networks Q]. 

More recently, the research on complex networks drifts also toward the community struc- 
ture of the networks. It has been shown that various complex networks can be organized in 
terms of the community structure (or modularity), in which groups of vertices that are highly 
interconnected have looser connections between them. The analysis of these structures has 
been a topic of intensive investigation in conjunction with many practical relevance, such 
as finding functional modules in biological networks 
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and identi 



the Web in order to construct, for instance, an efficient search engine 



ying communities m 



13|. 



Various attempts have been made to find or identify community structures in complex 
networks Il4i. Examples include the hierarchical clustering llal; methods based on the edge 

nri n n 

betweenness (lu,ll2|, the edge-clustering via the degree [la], the information centrality 19], 



and the eigenvector centrality [20]; the information-theoretic approach via the degree 21]. 
These methods are directly or indirectly related to the centrality measures. Considering that 
resulting community structure depends on the choice of measures (including the centrality 
measures) adopted in various schemes, it would be interesting to investigate any relation 
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among the centrality measures. 

The centrality (or sociometric status) has been studied particularly in the social science 
from the perspective of the social connectivity. It is an incarnations of a concept that 
describes vertices' prominence and/or importance in terms of features of their network envi- 
ronment |22j. It addresses an issue of which individuals are best connected to other or have 
most influence. This relative importance was quantified by various measures, developed 



mainly by researchers of the social networks 



23j. Different measures for the centrality have 



been proposed in the social science. Among them, four centrality measures are commonly 
used in the network analysis: the degree, the closeness, the betweenness, and the eigenvector 



centrality 
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In this paper, we empirically investigate correlations among the centrality measures in 
complex networks to gain some insight into the potential role of the measures in analyzing 
complex networks. We restrict our analysis to undirected networks, since some of centrality 
measures, such as the eigenvector centrality, cannot be defined unambiguously for directed 
networks. We analyze the film actor network, the scientific collaboration network, the neu- 
ral network of Caenorhabditis elegans, the Internet of both the Autonomous System (AS) 
and the router levels, and protein interaction networks. Analyzed organisms for protein 
interaction networks are Saccharomyces cerevisiae, Escherichia coli, Caenorhabditis elegans, 
Drosophila melanogaster, Helicobacter pylori, and Homo sapiens [25]. 



II. CENTRALITY MEASURES 



The centrality measures are introduced as a way of specifying and quantifying the cen- 
trality concept of a vertex in a network. Furthermore, they are often classified according 
to the extent to which a vertex has influence on the others: the immediate effects, the 
dative effects, and the tota, effect, ce„t r a,it y Q. Typica, exatap.e, wh.cff b e,o„ g to 
each class are: the closeness and degree for the immediate; the betweenness for the media- 
tive; the eigenvector for the total effect centrality. In addition, these measures are argued 
to be complementary rather than competitive because they stem from the same theoreti- 
cal foundation [zj. Although the measures are well known, we restate them here for the 
completeness with the emphasis on their implications. 

The degree centrality is the most basic of all measures which counts how many vertices 
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are involved in an interaction. It is denned, for a vertex i, as the number of edges that the 
vertex has. That is, 

n 

di = V^q»j , (l) 

3=1 

where n is the number of vertices in the network, and = 1 if vertices % and j are connected 
by an edge, Oy = otherwise. It measures the opportunity to receive information flowing 
through the network with everything else being equal. The degree is also a prominent 
quantity whose distribution follows a power-law distribution in scale-free networks 

The eigenvector centrality can be understood as a refined version of the degree centrality 
in the sense that it recursively takes into account how neighbor vertices are connected. That 
is, the eigenvector centrality e, of a vertex % is proportional to the sum of the eigenvector 
centrality of the vertices it is connected to. It is defined as 



A 1 a ij e 3 > ( 2 ) 
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where A is the largest eigenvalue to assure the centrality is non-negative. Thus, e; is the ith 
component of the eigenvector associated with the largest eigenvalue A of the network. While 
the eigenvector centrality of a network can be calculated via the standard method j3| using 
the adjacent matrix representation of the network, it can be also computed by an iterative 
degree calculation, known as the accelerated power method j^J. This method is not only 
more efficient, but consistent with the spirit of the refined version of the degree centrality. 

The closeness centrality stems from the notion that the influence of central vertices 
spreads more rapidly throughout a network than that of peripheral ones. It is defined, 
for each vertex i, as 

where dij is the length of the shortest path (geodesic) connecting vertices i and j. Thus, 
the closeness is closely associated with the characteristic path length [ll]], the average path 
length of all paths between all pairs of vertices. 

The betweenness centrality, or the load [28| , is a measure of the influence of a vertex over 
the flow of information between every pair of vertices under the assumption that information 
primarily flows over the shortest path between them. It measures the accumulated number 
of information transmissions that occur through the pass. The removal of high betweenness 
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vertices sometimes results in disconnecting a network. The betweenness centrality of a vertex 
i is defined as 

6, = £^, (4) 

where g^ is the number of geodesies between j and k, and gjkti) is the number of geodesies 
that pass through i among g^. Since bi is of the order 0(n 2 ), in this paper, we normalize bi 
with its maximum value of (n — l)(n — 2)/2 so that 6, G [0, 1] for all i. 



III. CORRELATION ANALYSIS 



A. Correlation coefficients and partial correlations 

For every network, we compute the four centrality measures so that all four values are 
assigned to each vertex. The correlation between a pair of different measures can be esti- 
mated by the correlation coefficient j^. More specifically, it is a quantity which measures 
the linear correlation between vertex-wise pairs of data, (A, B) = {(a^, bi), i — 1, 2, • • • , n}, 
and is given as 

= £ (a, - A) { „, - g) 
n a a o B 

where A and a a are the mean and standard deviation of the measurements of a centrality 
measure A. The value of Rab ranges from -1 to 1: 1 being totally correlated, and -1 being 
totally anti-correlated. 

Table I shows correlation coefficients estimated between pairs of data obtained from 
different centrality measures. As shown in Table I, the degree is strongly correlated with 
the betweenness and less strongly with the eigenvector centrality; whereas the closeness 
is weakly correlated with the other measures. This implies that the three measures (the 
degree, the betweenness, the eigenvector centrality) are closely inter-related. In general, 
correlation coefficients estimated from different variables could be significantly overlapped. 
That is, a certain amount of correlation found between any two measures may be tied in 
with correlations with the third. 

To take into account this point, we introduce the partial correlation method j^. The 
partial correlation is a method that determines the correlation between any two variables 
under the assumption that each of them is not correlated with the third. That is, it estimates 
the correlation between two variables while the third variable is held constant. Formally, 



the partial correlation between variables A and B while holding C constant is given in terms 
of the corresponding correlation coefficients as 

Rab — Rbc Rac / c n 

We estimate all possible partial correlations for each correlation coefficient, and results are 
shown in the parentheses of Table I. 

From Table I, we find that the partial correlation between the degree and the between- 
ness, while holding either the eigenvector or the closeness constant, differs little from the 
correlation coefficient between them. This implies that the strong correlation between the 
degree and the betweenness is solely due to the two measures by themselves, and little af- 
fected by other measures. In contrast, the partial correlation between the betweenness and 
the eigenvector (or the betweenness and the closeness) while holding the degree constant is 
anti-correlated. This implies that the positive correlation between the betweenness and the 
eigenvector (or the betweenness and the closeness) is almost entirely due to correlations with 
the degree. That is, a positive correlation between them would change dramatically to a 
negative correlation if they were not correlated with the degree centrality. Table I also shows 
that the correlation between the degree and the eigenvector is affected by the betweenness 
and closeness. 



B. Probability distribution of the betweenness 

From the correlation analysis, we uncover that the degree and the betweenness are cor- 
related much strongly than other centrality measures. This is, in a sense, expected since 
vertices of a high degree would have better chance to be included in the shortest path along 
a pair of vertices. To address the correlation between the degree k and the betweenness b, 
we relate them, via the Bayes' theorem, as 

P(b) = J2P(b\k)P(k), (7) 

k 

and focus on the conditional probability distribution P{b\k) of b given k. To obtain reliable 
statistics for the conditional distribution, we choose the film actor network as an example 
since it is composed of the largest number of vertices (over 370,000 vertices) in this study. 
Figure 1(a) shows a few conditional probability distributions P(b\k). As shown in Fig. 1(a), 
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the conditional distribution approximately follows a power-law form with its exponent f(k) 
depending on k, i.e., 

p(b\k) oc r /(fc) . (8) 

The fc-dependent exponent f(k) can be estimated from different degrees k. As Fig. 1(b) 
suggests, f(k) depends roughly linearly on k. Thus, we have 

f(k)*ak + P, (9) 

where parameters a and (3 can be estimated by the least square fit. 

With Eq. (jHJ) and (|9*|) . the probability distribution P(b) of the betweenness b can be 
expressed as 

P(b) oc b~& ^2 b~ ak P{k) . (10) 

Under the assumption that P(k) does not blow up as k increase, the dominant contribution of 
the summation comes from small values of k. Thus, to the first approximation, we find that 
the betweenness follows a power-law distribution, independent of the degree distribution. 
That is, 

p(b) oc r (Q+/3) , (ii) 

with a + (3 = 2.89 for the film actor network. 

The power-law distribution of the betweenness can also be obtained by the direct estimate 
of the betweenness distribution. Figure 2 shows betweenness probability distributions of a 
few networks. Scale-free networks, such as the film actor and the protein interaction network 
of D. melanog aster, have a power-law in the distribution of the betweenness which was first 



found in Ref. |28|. Considering that the degree is highly correlated with the betweenness, it 
is not surprising that the betweenness of scale-free networks follows a power-law distribution. 
From Fig. 2, we also find that the directly estimated exponent 2.36 for the film actor network 
is close to the derived exponent of a + (3 = 2.89. 

Moreover, Fig. 2 shows that the power-law distribution of the betweenness is not restricted 
to the scale-free network, but held true to other types of networks, such as the collaboration 
network and the neural network of C. elegans. Furthermore, as depicted in Fig. 3, the 
conditional probability distribution of non scale-free networks, for instance, the collaboration 
network, is also approximately a power-law distribution; furthermore, the exponent of the 
distribution is insensitive to the degree k. 



The power-law of the conditional probability distribution is less clear for networks of 
small number of vertices. This is probably due to insufficient number of data to obtain 
reliable statistics. We, however, have seen the power-law of the conditional distribution 
for networks composed of relatively sufficient number of vertices, irrespective of the type of 
networks. From this, we may infer that it is the power-law of the conditional probability 
distribution that is responsible for the power-law nature of the betweenness. 

For a comparison, we apply the same analysis as above to the random network. Table 
II shows correlation coefficients and partial correlations between measures estimated for 
the random network. In contrast to the real networks, every centrality measure is very 
strongly correlated with every other measures. This distinctive characteristic, however, 
changes dramatically once we introduce the partial correlation. From partial correlation 
estimates, we find that correlation coefficients between all possible pairs of measures, except 
that between the degree and the betweenness, contain considerable amount of correlation 
tied in with the other measures. Similar to the real networks, a strong correlation between 
the degree and the betweenness is nearly maintained when these measures are assumed not 
to be correlated with the other measures. 

We also examine the conditional probability distribution of the betweenness given the 
degree. A few conditional distributions P(b\k) of the betweenness b given the degree k 
are depicted in Fig. 4. Unlike complex networks, the distribution is not a power-law, but 
approximately a Gaussian irrespective of the degree. Since the conditional distribution of 
the betweenness given the degree does not follow a power-law distribution, we expect that 
the betweenness distribution of the random network may as well differ from that of real 
networks. As shown in Fig. 5, it turns out that the betweenness distribution for the random 
network can be approximated as a log-normal distribution, 



where \x and a are the scale and the shape parameters of the distribution, respectively. 

IV. SUMMARY AND CONCLUSION 

In this paper, in order to investigate correlations among the measures, we applied four 
centrality measures (the degree, the closeness, the betweenness, and the eigenvector central- 



P(b) 



1 



-(1ii6- m ) 2 /2<t 2 



(12) 
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ity) to various types of complex networks as well as the random network. We found that the 
degree was strongly correlated with the betweenness, and the correlation was robust in the 
sense that the extent of correlation was little affected by the presence of the other measures. 
This finding was confirmed by estimating the partial correlation between the degree and the 
betweenness, while holding either the eigenvector or the closeness constant. 

Based on the strong correlation existed between the two measures, we further uncovered 
the characteristics of the betweenness. Not only for scale-free networks but for other types of 
networks, the conditional distribution of the betweenness given the degree was approximately 
a power-law which, in turn, played a predominant role in understanding the power-law 
distribution of the betweenness. This feature was distinct from the random networks in 
which the conditional distribution was roughly a Gaussian. 

Within complex networks, the scale-free network by itself implies the existence of a hier- 
archy with respect to the degree centrality 31 1. Similarly, the power-law distribution of 
the betweenness may suggest a new potential role of the betweenness in quantifying the hi- 
erarchy in conjunction with the community structure jjjj. Therefore, it may provide us with 
feasibility to use the betweenness and/or related quantities as a measure for constructing 
hierarchical and community structures of complex networks. 
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TABLE I: Correlation coefficients and corresponding partial correlations (in the parentheses) be- 
tween pairs of centrality measures for each network. X stands for the degree centrality; while Y, 
Z, and W stand for the betweenness, the eigenvector, and the closeness centrality, respectively. 



Note that the notation for the partial correlation is abbreviated in such a way that corresponding 



two variables are 


replaced by a 


"big dot". 










Network 


Rxy 
(R»z i R»w ) 


Rxz 

RmY i R»W ) 


Ryz 
(R.y,R,z) 


Rxw 
(R»x , R»z) 


Ryw 
(R.x,R.y) 


Rzw 
(R»x i R»w) 


Film actor 


0.81 
(0.85, 0.81) 


0.61 
(0.71, 0.59) 


0.26 
(-0.50, 0.23) 


0.31 
(0.27, 0.23) 


0.20 
(-0.10, 0.15) 


0.22 
(0.04, 0.18) 


Internet (AS) 


u.yo 


U.oz 


U. ( y 


n i o 
U.iy 


U.lD 


n an 
U.oU 


(0.94, 0.98) 


(0.38, 0.91) 


(-0.12, 0.88) 


(0.16, -0.68) 


(-0.12, -0.65) 


(0.80, 0.79) 


Internet (router) 


0.58 
(0.55, 0.57) 


0.36 
(0.28, 0.34) 


0.23 
( 0.03, 0.21) 


0.29 
(0.26, 0.27) 


0.15 
(-0.03, 0.13) 


0.12 
(0.02, 0.09) 


Collaboration 


0.72 
(0.71, 0.65) 


0.53 
(0.52, 0.45) 


0.26 
(-0.21, 0.14) 


0.56 
(0.43, 0.49) 


0.40 
(-0.00, 0.35) 


0.33 
(0.04, 0.25) 


Neural network 


0.73 
(0.70, 0.59) 


0.95 
(0.95, 0.74) 


0.58 
(-0.53, 0.17) 


0.90 
(0.86, 0.29) 


0.58 
(-0.26, 0.15) 


0.91 

/ r\ o f-j r\ c>r>\ 

(0.37, 0.86) 


S. cerevisiae 


0.88 


0.82 


0.62 


0.57 


0.34 


0.68 


(0.83, 0.89) 


(0.74, 0.72) 


(-0.38, 0.57) 


(0.59, 0.02) 


/ A A C\ C\ 1 A \ 

(-0.40, -0.14) 


(0.45, 0.63) 


E. eoli 


0.82 
(0.73, 0.82) 


0.75 
(0.60, 0.86) 


0.57 
(-0.12, 0.62) 


0.20 
(0.08, -0.65) 


0.18 
( 0.04, -0.34) 


0.68 
(0.82, 0.72) 


C. elegans 


0.96 


0.74 


0.71 


0.41 


0.37 


0.60 


(0.92, 0.95) 


(0.32, 0.68) 


(-0.03, 0.65) 


(0.22, -0.05) 


(-0.10, -0.09) 


(0.47, 0.51) 


D. melanogaster 


0.91 
(0.74, 0.90) 


0.91 
(0.72, 0.81) 


0.80 
(-0.16, 0.72) 


0.69 
(0.65, 0.15) 


0.51 
(-0.42, -0.15) 


0.71 
(0.28, 0.59) 


H. pylori 


0.94 


0.86 


0.82 


0.68 


0.60 


0.80 


(0.80, 0.91) 


(0.46, 0.72) 


( 0.06, 0.70) 


(0.42, -0.03) 


(-0.15, -0.16) 


(0.57, 0.67) 


H. sapiens 


0.73 


0.52 


0.20 


0.37 


0.39 


0.10 


(0.75, 0.69) 


(0.56, 0.52) 


(-0.31, 0.18) 


(0.13, 0.37) 


( 0.19, 0.38) 


(-0.12, 0.02) 


Average 


0.82 


0.72 


0.53 


0.47 


0.35 


0.52 


(0.78, 0.79) 


(0.57, 0.67) 


(-0.21, 0.46) 


(0.37, 0.04) 


(-0.12, -0.03) 


(0.34, 0.48) 
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TABLE II: The correlation coefficients and partial correlation between all possible pairs of cen- 
trality measures estimated for the random network of different number of vertices, N=1000, 3000, 
and 6000. For all cases, each vertex has the same average degree (k) = 10. X stands for the 
degree centrality; while Y, Z, and W stand for the betweenness, the eigenvector, and the closeness 
centrality, respectively. The notation for the partial correlation is abbreviated as Table I. 



N 


Rxy 


Rxz 


Ryz 


Rxw 


Ryw 


Rzw 


(R.z, Rmw) 


(R,y,R»w) 


{R.y,R.z) 


(R,x,R»z) 


(R. x ,R.y) 


(R,x,R»w) 




0.97 


0.95 


0.94 


0.92 


0.90 


0.97 


1000 


(0.76, 0.86) 


(0.39, 0.61) 


(0.27, 0.69) 


(0.43, -0.07) 


( 0.05, -0.25) 


(0.81, 0.86) 




0.98 


0.95 


0.94 


0.93 


0.88 


0.96 


3000 


(0.82, 0.93) 


(0.42, 0.56) 


(0.14, 0.72) 


(0.72, 0.21) 


(-0.43, -0.23) 


(0.67, 0.82) 




0.98 


0.95 


0.94 


0.95 


0.90 


0.97 


6000 


(0.79, 0.89) 


(0.45, 0.35) 


(0.16, 0.60) 


(0.77, 0.41) 


(-0.43, -0.10) 


(0.69, 0.83) 
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1E-6 1E-5 1E-4 1E-3 

betweenness, b 



FIG. 1: (a) Log-log plots of the conditional distributions P(b\k) of the betweenness b given the 
degree k in the film actor network: k = 3 (■), k = 7 (•), and k = 10 (a). The least-square fits 
(dotted lines) on the slope of k = 3, k = 7, and k = 10 yield —2.63 ± 0.12, —2.35 ± 0.10, and 
—2.51 ± 0.18, respectively. Plots for k = 7 and 10 are shifted to the left for the display purpose, 
(b) (inset) The plot of the exponent f(k) in Eq. (JSJ) versus the degree k. Estimated values from 
the least square fit for Eq. Q are a = 0.04 ± 0.01 and (3 = —2.85 ± 0.05. The errors associated 
with the fit are statistical uncertainties based on fitting a straight line. 
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FIG. 2: Log- log plots of betweenness distributions for selected complex networks: the film actor 
network (■), the collaboration network (•), the protein interaction network of D. melanogaster 
(a), and the neural network of C. elegans (▼). Estimated exponents (dotted lines), by least square 
fits on slopes, are 2.36 ± 0.10, 2.27 ± 0.08, 2.11 ± 0.12, and 1.31 ± 0.11, respectively. Plots, except 
for the film actor network, are shifted horizontally for the display purpose. 
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FIG. 3: Log-log plots of the conditional distributions P(b\k) of the betweenness b given degree k 
of the scientific collaboration network: k = 4 (□), k = 6 (0)> an d & = 9 (A). The least-square fits 
(dotted lines) on the slope of the k = 4, k = 6, and k = 9 yield -2.07 ± 0.15, -2.01 ± 0.13, and 
—2.22 ± 0.19, respectively. Plots for k = 6 and 9 are shifted to the left for the display purpose. 



16 



0.5- 




-4-3-2-10123 

betweenness, b 



FIG. 4: Plots of the conditional distributions P(b\k) of the betweenness b given the degree k: k = 8 
(■), k = 10 (•), k = 12 (□), and k = 14 (O) for the random network of 3000 vertices and the 
average degree (k) = 10. Each distribution of different k is normalized such that b — > (b — &)/cr&, 
where 6 and <r& are the mean and standard deviation of b. By the normalization, all distributions 
collapse to the standard Gaussian distribution (solid line). 
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FIG. 5: The distribution of the betweenness for the random network of 3000 vertices and the 
average degree (k) = 10, together with the corresponding log-normal fit (solid line). The scale and 
shape parameters of the log-normal fit are estimated using the maximum likelihood estimate from 
the data. Estimated the scale and shape parameters are p, = e _a71 and a = 0.57, respectively. 
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